learned function
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results. However, we show theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies. Our results lead to specific predictions of the time it will take a network to learn functions of varying frequency. These predictions match the empirical behavior of both shallow and deep networks.
Reviews: The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
What functions do NNs learn (approximate a function) and how fast are central questions in the study of the dynamics of (D)NNs. A common conception behind this problem is that if one trains a network longer than necessary, then the model might overfit. However, the definition of overfitting appears to vary from paper to paper. Moreover, overfitting is intimately linked with another hot topic in the area: over-parametrization. Please refer to "Advani & Saxe 2017 High Dimensional Dynamics of Gen Error for NNs" for a modern take on this link. Keeping in mind this link, we focus on fixed-size networks.
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results.
How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer
Heiss, Jakob, Teichmann, Josef, Wutte, Hanna
Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a Generalized Additive Model (GAM)-typed regression in which infinitely many directions are considered: the Infinite Generalized Additive Model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of the results in [9], where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input dimension d = 1.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > New York > New York County > New York City (0.04)
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
Ronen, Basri, Jacobs, David, Kasten, Yoni, Kritchman, Shira
We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results.